<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><meta name="X-UA-Compatible" content="IE=edge"><title> Connection between album genre & color - Experiment for Big-Data course · shawlley</title><meta name="description" content="Connection between album genre &amp; color - Experiment for Big-Data course - shawlley"><meta name="viewport" content="width=device-width, initial-scale=1"><link rel="icon" href="/icon.png"><link rel="stylesheet" href="/css/apollo.css"><link rel="search" type="application/opensearchdescription+xml" href="http://example.com/atom.xml" title="shawlley"><meta name="generator" content="Hexo 6.3.0"><link rel="alternate" href="/atom.xml" title="shawlley" type="application/atom+xml">
</head><body><div class="wrap"><header><a href="/" class="logo-link"><img src="/icon.png" alt="logo"></a><ul class="nav nav-list"><li class="nav-list-item"><a href="/" target="_self" class="nav-list-link">BLOG</a></li><li class="nav-list-item"><a href="/archives/" target="_self" class="nav-list-link">ARCHIVE</a></li><li class="nav-list-item"><a href="https://github.com/Shawlleyw" target="_blank" class="nav-list-link">GITHUB</a></li></ul></header><main class="container"><div class="post"><article class="post-block"><h1 class="post-title">Connection between album genre & color - Experiment for Big-Data course</h1><div class="post-info">Oct 29, 2022</div><div class="post-content"><p>This is an experiment for big-data analysis taking advantage of hadoop systems and it’s MapReduce framework.</p>
<p>In this experiment we researched the connection between different album genres and the color of their covers to find out the most common colors of one specific genre.</p>
<p><strong>This project aims to do a big-data practice with hadoop map-reduce, which is required by a course assignment. Don’t take it seriously.</strong></p>
<span id="more"></span>

<h1 id="Environments-and-Dependencies"><a href="#Environments-and-Dependencies" class="headerlink" title="Environments and Dependencies"></a>Environments and Dependencies</h1><p><strong>Hadoop</strong></p>
<ul>
<li>hadoop-streaming</li>
<li>yarn</li>
<li>hdfs</li>
</ul>
<p><strong>Python</strong></p>
<ul>
<li>tqdm</li>
<li>sklearn</li>
<li>matplotlib</li>
<li>pillow</li>
</ul>
<h1 id="MapReduce"><a href="#MapReduce" class="headerlink" title="MapReduce"></a>MapReduce</h1><p>We conducted this experiment through a Map-Reduce process. Each album is designated as a mapping task to be processed by a mapper, generating a transient record. Then the reducers would collect, classify and reduce the records to the final result.</p>
<p><img src="/images/genre-hue/map-reduce.svg" alt="map-reduce"></p>
<p><strong>Map</strong></p>
<p>Basically, a mapper would read records from the dataset. For each album(a given record), the mapper would request its url, fetch the cover, normalize the image, and then extract its main color taking advantage of KMeans. And mappers would generate transient records consisting of genres and main colors, and these records would then be fed to reducers.</p>
<p><strong>Reduce</strong></p>
<p>Reducers read transient records generated by mappers, c<br>lassify them by genres, and reduce records of the same genres to get a final result. When all reducers end, the result would be dumped out and we can visualize it to have a insight.</p>
<h1 id="Data"><a href="#Data" class="headerlink" title="Data"></a>Data</h1><p><strong>Dataset</strong></p>
<p>The dataset <code>mard_metadata.json</code> comes from <a target="_blank" rel="noopener" href="https://www.upf.edu/web/mtg/mard">https://www.upf.edu/web/mtg/mard</a>. It contains a bunch of json records and each record has 2 useful key-value pairs, <code>imUrl</code> which is the url of the cover image of an album and <code>root-genre</code> which is the genre of an album. </p>
<p><strong>Mapper Result</strong></p>
<p>Mapper-Result stores the temporary result of the mapping stage. It contains abundant records of genres and main colors, and would be used to generate the ultimate result by reducers.</p>
<p><strong>Reducer Result</strong></p>
<p>Reducer-Result is dumped in json format listed as below.</p>
<figure class="highlight json"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="punctuation">&#123;</span></span><br><span class="line">    <span class="attr">&quot;genre&quot;</span><span class="punctuation">:</span> _<span class="punctuation">,</span></span><br><span class="line">    <span class="attr">&quot;size&quot;</span><span class="punctuation">:</span> _<span class="punctuation">,</span></span><br><span class="line">    <span class="attr">&quot;hues&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span></span><br><span class="line">        _<span class="punctuation">,</span></span><br><span class="line">        _<span class="punctuation">,</span></span><br><span class="line">        _<span class="punctuation">,</span></span><br><span class="line">        _<span class="punctuation">,</span></span><br><span class="line">    <span class="punctuation">]</span><span class="punctuation">,</span></span><br><span class="line"><span class="punctuation">&#125;</span></span><br></pre></td></tr></table></figure>


<h1 id="Build-and-Run"><a href="#Build-and-Run" class="headerlink" title="Build and Run"></a>Build and Run</h1><p>We ran this pattern on a fake computer cluster composed of docker containers with the following steps.</p>
<h1 id="Result"><a href="#Result" class="headerlink" title="Result"></a>Result</h1><p>The histogram basically lists some album genres and the 4 major colors of their covers. Y axis stands for the number of albums we analyzed, while X axis stands for the genres. Each column is represented in 4 colors which stands for the 4 main colors on the cover of a genre with thicker ones reflecting higher proportions.</p>
<p>Note that we are not sure whether there is actually any connections between genres and colors of the covers, so this kind of result may not be accurate at all. </p>
<p><strong>WE MADE IT JUST FOR FUN(ALSO FOR A PRACTICE ON HADOOP MAP-REDUCE) ;)</strong></p>
<p><img src="/images/genre-hue/result.png" alt="visualization"></p>
</div></article></div></main><footer><div class="paginator"><a href="/2022/12/15/A/" class="prev">PREV</a><a href="/2022/10/01/A-tool-for-NoC-visualization/" class="next">NEXT</a></div><div class="copyright"><p>© undefined - 2022 <a href="http://example.com">shawlley</a>, powered by <a href="https://hexo.io/" target="_blank">Hexo</a> and <a href="https://github.com/pinggod/hexo-theme-apollo" target="_blank">hexo-theme-apollo</a>.</p></div></footer></div><script async src="//cdn.bootcss.com/mathjax/2.7.0/MathJax.js?config=TeX-MML-AM_CHTML" integrity="sha384-crwIf/BuaWM9rM65iM+dWFldgQ1Un8jWZMuh3puxb8TOY9+linwLoI7ZHZT+aekW" crossorigin="anonymous"></script></body></html>